-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Controversial Freedb support #138
Conversation
Thanks for working on this. I have no opinion, but wanted to let you know it's appreciated regardless. Let's see what other think, or perhaps have already said. |
morituri/command/cd.py
Outdated
self.parser.add_argument('-U', '--unknown', | ||
action="store_true", dest="unknown", | ||
help="whether to continue ripping if the CD is unknown", | ||
default=False) | ||
self.parser.add_argument('--allow-freedb', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about the short option '-F' for FreeDB Fallback?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not make using this option easy and didn't want to reserve a 1-letter option for a somehow "legacy" mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All options should have a short form.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many don't (--cdr
, --track-template
, --disc-template
in this module)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's something we're planning on fixing shortly, although I understand that you may not be privy to transient information on IRC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also can't argue there's a consensus on this topic, I just strongly prefer BSD-style interfaces; so do what you want.
morituri/command/cd.py
Outdated
|
||
allow_freedb = getattr(self.options, 'allow_freedb', False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a default, so I think allow_freedb = self.options.allow_freedb
is appropriate. Better yet I'd ditch the temporary variable and just use if self.options.allow_freedb:
on line 147.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I have no idea what pattern I was trying to copy here...
Edit: ah, that's copied from the "unknown" code below.
morituri/command/cd.py
Outdated
freedb_data = sfdb.read(match['category'], match['discid']) | ||
self.program.metadata = self.program.craftMusicBrainzFromFreeDB(freedb_data) | ||
|
||
if not self.program.metadata: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the convention of failing early:
if matches:
...
if not self.options.allow_freedb and not self.options.unknown:
logger.critical(...)
return -1
logger.warning(...)
freedb_data ...
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep the diff as small as possible, and I'm not sure I'm following what you are trying to restructure here. I'd rather have such change on top of the proposed change if you don't mind.
morituri/command/cd.py
Outdated
sfdb = simplefreedb.SimpleFreeDB() | ||
matches = sfdb.query(cddbid[0], cddbid[1], cddbid[2:-1], cddbid[-1]) | ||
if matches: | ||
match = matches[0] # TODO: honor --prompt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you take a gander at the work I did to rewrite AccurateRip, I factored out a generic prompt function you might find useful here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
git grep -i prompt
doesn't seem to raise anything related. Can you point out where I can find it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it was @ RecursiveForest@ca33dcf#diff-aa28dbdd5d3fa03298c07270deac1ec1R40 , not part of the AR PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not mainline, or is it?
morituri/common/simplefreedb.py
Outdated
import urllib | ||
import morituri | ||
|
||
class SimpleFreeDB: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know morituri is organised this way, but since you're not storing any dynamic data on the class, I think this entire module might be a good candidate for a more functional style. It'll read pretty much the same way, but indented 4 characters less and with self
less signatures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I think I prefer using the class system for at least namespace. Python is way too much a free4all language already, so it helps with the scoping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand-- modules provide namespacing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends how you import it. Anyway, I'm just trying to be consistent with the common way of writing Python. Do you think it's really a problem to wrap an API in a class?
morituri/common/simplefreedb.py
Outdated
|
||
def _craft_match(self, categ, discid_str, dtitle): | ||
artist, title = self._split_dtitle(dtitle) | ||
return {'category': categ, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just spell out 'category'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do.
morituri/command/cd.py
Outdated
allow_freedb = getattr(self.options, 'allow_freedb', False) | ||
|
||
sfdb = simplefreedb.SimpleFreeDB() | ||
matches = sfdb.query(cddbid[0], cddbid[1], cddbid[2:-1], cddbid[-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the cddbid parsing logic should be done inside of sfdb.query(), because this is pretty opaque. If you know the sfdb.query() signature, sure you can work it out, but outside of your test cases it doesn't seem like you're ever calling sfdb.query() without having an already composed cddbid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather have getCDDBValues()
not build a custom array of stuff. It would indeed be simpler here to just pass cddbid
, but the way result
is crafted in getCDDBValues()
makes me believe it was originally written by someone who had no clue about data structures. If you consider SimpleFreeDB
could live outside the whipper
project, it is saner to have this prototype, because the parameters are constructed separately: there is no such thing as "grab all the info and pack them in an array that way" thing outside of whipper
.
Edit: I split the unpacking of the values earlier in this code to clarifies the call.
morituri/command/cd.py
Outdated
@@ -279,15 +295,20 @@ def add_arguments(self): | |||
self.parser.add_argument('--track-template', | |||
action="store", dest="track_template", | |||
default=DEFAULT_TRACK_TEMPLATE, | |||
help="template for track file naming (default default)") | |||
help="template for track file naming") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these (and the above whitespace change in getPath) are both good changes, but I think don't belong in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whitespace change seems to have been included already by some other change. I'll take out the default-default patch out of it and make a new pull request in a moment.
Edit: PR @ #188
morituri/common/program.py
Outdated
def getCDDB(self, cddbdiscid): | ||
""" | ||
@param cddbdiscid: list of id, tracks, offsets, seconds | ||
def craftMusicBrainzFromFreeDB(self, freedb_data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A short docstring explaining that this is creating musicbrainz-sytle metadata from a freedb entry would make this function perfect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do.
morituri/common/program.py
Outdated
if number > 0: | ||
has_mbinfo = mbidTrack and mbidTrackArtist and mbidAlbum \ | ||
and mbidTrackAlbum and mbDiscId | ||
if number > 0 and has_mbinfo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be worth assigning tags['...TRACKID'] et al. within individual if branches instead of all at once? I'm not sure if it's useful or not to have, say, ARTISTID and not ALBUMID, or if that would ever happen, but it might be worth it? shrugs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, this if
is protecting the insert of MusicBrainZ tags, and you absolutely don't want them when using --allow-freedb
. This code basically checks for every field that is supposed to be present with MusicBrainZ; if any is missing, that's because we're hacking FreeDB stuff in it and so you don't want to end up with broken MusicBrainZ references obtained from dubious sources.
morituri/common/simplefreedb.py
Outdated
|
||
# Copyright (C) 2017 Clément Bœsch <u@pkh.me> | ||
|
||
# This file is part of morituri. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whipper!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I'm in the process of rebasing that stuff, it was morituri when I wrote that patchset. I won't forget to fix that :)
morituri/common/simplefreedb.py
Outdated
# -*- Mode: Python; test-case-name: morituri.test.test_common_program -*- | ||
# vi:si:et:sw=4:sts=4:ts=4 | ||
|
||
# Morituri - for those about to RIP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RIP morituri-- it's whipper now >:D
morituri/common/simplefreedb.py
Outdated
import re | ||
import socket | ||
import getpass | ||
import urllib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it requires another dependency (although one that we may introduce for ARv2 support regardless), but it might be worth using the 'requests' library in lieu of urllib here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? urllib is builtin. I got so many issues with requests...
morituri/common/simplefreedb.py
Outdated
return data | ||
|
||
def main(): | ||
test_queries = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be preferable to decouple testing from the actual code by creating an accompanying test module, like the rest of the application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the little assert, this is not actually a test but more a debug tool (it's not testing output, it's dumping data). I'd prefer to keep tests depending on external websites such as freedb.org outside of the test suite.
Overall this is pretty solid. I left a few notes regarding style and organisation, but the core functionality looks pretty good. I'm not convinced yet it's worth the potential feature creep, especially because we want to encourage MusicBrainz contributions, but I can see a case being made for its inclusion. You could also consider writing a small tagging tool on top of simplefreedb that applies freedb-sourced metadata to flacs that were ripped by whipper with the --unknown flag. Would you consider making simplefreedb into its own independent python module that could be developed independently of whipper, which we would then import? The two existent cddb modules for python are not particularly well packaged (and we know at least one is lacking in UTF-8 support, apparently), so there's definitely a place for it. This is pretty clean code. |
I don't think I'll be motivated enough to do all the packaging effort, I'm sorry. Feel free to do so though. Thanks for the long review, I'll make the adjustments you requested soon. |
cb0550c
to
d24f918
Compare
i would argue for this even if only for archival purposes. here i'm using whipper to rescue an old personal CD collection and i happened to have tagged some of those disks on freeDB. they have very low distribution (this is probably the only copy left in the universe) so there's little point in repopulating musicbrainz with this stuff... |
@anarcat Well, that's a pretty niche use case, isn't it? (considering that you can also tag those files after whipper has finished its job) Anyway, right now we aren't using the |
just to weigh another point in - what i ended up doing here is to feed the data back into musicbrainz. there are two ways of viewing this:
i'd rather not encourage the second pattern, but it's true that we might not encourage people to contribute to musicbrainz if data is only on freedb.... |
Related to #44. |
I'd argue that that means that there's even more point in feeding that data into MusicBrainz. High distribution stuff will likely make its way there eventually anyway, but MusicBrainz wants information on all music. If you're the only one left who can "tell" us about a given release or maybe even artist, then we'd be all the more appreciative of you doing so. (Keep in mind that MusicBrainz is a database/encyclopedia first, any uses of the data is secondary. There are people using the MB data from cleaning up or expanding on other datasets (one of the common use cases for the streaming services that are MetaBrainz supporters), and maybe there's a copy of some of the music you have access to floating around on YouTube or somewhere that would benefit from being linked up to MBIDs. There are a bunch of researchers around the world using MusicBrainz to look into various questions related to music information, needless to say, the more information there is, the better/more accurate results they'll get. Just because there may not be another person to rip the disc anytime in the future, doesn't mean that the value of the disc being in MusicBrainz is zero or even insignificant.) |
It would be really nice to add an option to allow grabbing metadata from freedb. It's completely ok if it doesn't by default but it would really help with the adoption of whipper imo. |
@Freso @MerlijnWajer What's your opinion about this? |
Considering that freedb.org is sunsetting and seems like it will be shut down in a few months, I vote to reject this PR and any other attempt of freedb compatibility. Even if it gets resurrected (on the same domain or elsewhere), the technology behind it is ancient and largely in maintenance-only mode (if even that much!). Let’s move on. |
@Freso Thanks for the reply: didn't know about the imminent shutdown! For this reason I'm closing the pull request. |
That's sad, there are a lot more releases on freedb than MB, and if someone hosts a mirror (and they already exist) it could still be used. In my experience having only MB is pretty much equivalent to having no metadata at all unfortunately. |
That was pretty much my experience and the reason I hacked this branch. Out of all the CDs I tried to RIP with whipper (about 20), almost none where found on MB, while almost all of them were on freedb. |
Same here: what brought me here was the daunting task of ripping my old CD collection out of fear it would dissolve itself in the mists of time (which it partly did). Because many of those releases predate the very existence of MB, a lot of those releases are only on FreeDB.org... But I guess if the site dies, it's not worth keeping this implementation around. @peioe which mirror are you aware of? |
Here's another controversial thought - what about adding gracenote support? They pretty much "inherited" their data from FreeDB.org anyway. (I might already have code for this) |
I’m principally opposed to supporting a closed source, proprietary data source. (Also, Gracenote were the ones responsible for shutting off contributors’ access to CDDB 20 years ago, leading to the creation of FreeDB, MusicBrainz, et al.) |
http://www.freedb.org/en/faq.3.html#18 EAC for example allows you to specify the url to a freedb server, so you can just host your own. Even if freedb goes down tomorrow, the data it collected over the years is still extremely useful, and if it's possible to host a mirror it would be great to be able to use it with whipper. |
OK so I understand you probably don't want this, but I'm submitting anyway. I'll maintain my crap in my fork in case it's rejected.
The
python-cddb
dependency drop is controversial because I'm actually adding the support to CDDB natively. The main benefit aside from having one less dependency is to have proper unicode support. Note: I should probably move themain()
to the test suite.The second commit introduces the
--allow-freedb
option in which we craft fake MusicBrainz metadata. Note that this option is NOT the default, and that it also warns the user about how evil this is.